Goto

Collaborating Authors

 exponential convergence rate


Bayesian Optimization with Exponential Convergence

Neural Information Processing Systems

This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the δ -cover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional non-convex global optimization problem, which can be time-consuming and hard to implement in practice. Also, the existing Bayesian optimization method with exponential convergence [ 1] requires access to the δ -cover sampling, which was considered to be impractical [ 1, 2]. Our approach eliminates both requirements and achieves an exponential convergence rate.


Convergence Rate in Nonlinear Two-Time-Scale Stochastic Approximation with State (Time)-Dependence

arXiv.org Artificial Intelligence

The nonlinear two-time-scale stochastic approximation is widely studied under conditions of bounded variances in noise. Motivated by recent advances that allow for variability linked to the current state or time, we consider state- and time-dependent noises. We show that the Lyapunov function exhibits polynomial convergence rates in both cases, with the rate of polynomial delay depending on the parameters of state- or time-dependent noises. Notably, if the state noise parameters fully approach their limiting value, the Lyapunov function achieves an exponential convergence rate. We provide two numerical examples to illustrate our theoretical findings in the context of stochastic gradient descent with Polyak-Ruppert averaging and stochastic bilevel optimization.


Exponential convergence rate for Iterative Markovian Fitting

arXiv.org Artificial Intelligence

Two distributions µ, ν P ( X) with everywhere positive density are given. Recently the IMF algorithm [4] was proposed to solve problem (1), which consists of successive transformations interpreted as projections onto the sets of Markov and q -reciprocal processes (see [3, null2.5]): Here we for the first time prove exponential convergence of IMF . We rely on convergence analysis of iterations [1] minimizing a strongly convex function with a Lipschitz gradient. We recall from [3, Theorem 3.1] that the solution p The work was supported by the grant for research centers in the field of AI provided by the Ministry of Economic Development of the Russian Federation in accordance with the agreement 000000C313925P4F0002 and the agreement with Skoltech 139-10-2025-033.


Reviews: Maximum Entropy Monte-Carlo Planning

Neural Information Processing Systems

This paper proposes a new MCTS algorithm, Maximum Entropy for Tree Search (MENTS), which combines the maximum entropy policy optimization framework with MCTS for more efficient online planning in sequential decision problems. The main idea is to replace the Monte Carlo value estimate with the softmax value estimate as in the maximum entropy policy optimization framework, such that the state value can be estimated and back-propagated more efficiently in the search tree. Another main novelty is that it proposes an optimal algorithm, Empirical Exponential Weight (E2W), to be the tree policy to do more exploration. It shows that MENTS can achieve an exponential convergence rate towards finding the optimal action at the root of the tree, which is much faster than the polynomial convergence rate of the UCT method. The experimental results also demonstrate that MENTS performs significantly better than UCT in terms of sample efficiency, in both synthetic problems and Atari games.


Bayesian Optimization with Exponential Convergence Kenji Kawaguchi Leslie Pack Kaelbling Tomás Lozano-Pérez MIT

Neural Information Processing Systems

This paper presents a Bayesian optimization method with exponential convergence without the need of auxiliary optimization and without the δ-cover sampling. Most Bayesian optimization methods require auxiliary optimization: an additional non-convex global optimization problem, which can be time-consuming and hard to implement in practice. Also, the existing Bayesian optimization method with exponential convergence [1] requires access to the δ-cover sampling, which was considered to be impractical [1, 2]. Our approach eliminates both requirements and achieves an exponential convergence rate.


Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

arXiv.org Machine Learning

We consider the gradient descent flow widely used for the minimization of the $\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two modified versions; one adapted for the overparametrized setting, and the other for the underparametrized setting. Both have a clear and natural invariant geometric meaning, taking into account the pullback vector bundle structure in the overparametrized, and the pushforward vector bundle structure in the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry.


Elliptic PDE learning is provably data-efficient

arXiv.org Artificial Intelligence

PDE learning is an emerging field that combines physics and machine learning to recover unknown physical systems from experimental data. While deep learning models traditionally require copious amounts of training data, recent PDE learning techniques achieve spectacular results with limited data availability. Still, these results are empirical. Our work provides theoretical guarantees on the number of input-output training pairs required in PDE learning. Specifically, we exploit randomized numerical linear algebra and PDE theory to derive a provably data-efficient algorithm that recovers solution operators of 3D uniformly elliptic PDEs from input-output data and achieves an exponential convergence rate of the error with respect to the size of the training dataset with an exceptionally high probability of success.


A Case of Exponential Convergence Rates for SVM

arXiv.org Artificial Intelligence

Classification is often the first problem described in introductory machine learning classes. Generalization guarantees of classification have historically been offered by Vapnik-Chervonenkis theory. Yet those guarantees are based on intractable algorithms, which has led to the theory of surrogate methods in classification. Guarantees offered by surrogate methods are based on calibration inequalities, which have been shown to be highly sub-optimal under some margin conditions, failing short to capture exponential convergence phenomena. Those "super" fast rates are becoming to be well understood for smooth surrogates, but the picture remains blurry for non-smooth losses such as the hinge loss, associated with the renowned support vector machines. In this paper, we present a simple mechanism to obtain fast convergence rates and we investigate its usage for SVM. In particular, we show that SVM can exhibit exponential convergence rates even without assuming the hard Tsybakov margin condition.


Disambiguation of weak supervision with exponential convergence rates

arXiv.org Artificial Intelligence

In many applications of machine learning, such as recommender systems, where an input characterizing a user should be matched with a target representing an ordering of a large number of items, accessing fully supervised data (,) is not an option. Instead, one should expect weak information on the target, which could be a list of previously taken (if items are online courses), watched (if items are plays), etc., items by a user characterized by the feature vector. This motivates weakly supervised learning, aiming at learning a mapping from inputs to targets in such a setting where tools from supervised learning can not be applied off-the-shelves. Recent applications of weakly supervised learning showcase impressive results in solving complex tasks such as action retrieval on instructional videos (Miech et al., 2019), image semantic segmentation (Papandreou et al., 2015), salient object detection (Wang et al., 2017), 3D pose estimation (Dabral et al., 2018), text-to-speech synthesis (Jia et al., 2018), to name a few. However, those applications of weakly supervised learning are usually based on clever heuristics, and theoretical foundations of learning from weakly supervised data are scarce, especially when compared to statistical learning literature on supervised learning (Vapnik, 1995; Boucheron et al., 2005; Steinwart and Christmann, 2008). We aim to provide a step in this direction. In this paper, we focus on partial labelling, a popular instance of weak supervision, approached with a structured prediction point of view Ciliberto et al. (2020). We detail this setup in Section 2. Our contributions are organized as follows.


Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

arXiv.org Machine Learning

We consider stochastic gradient descent for binary classification problems in a reproducing kernel Hilbert space. In traditional analysis, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition, but theoretical analysis of this was limited to the square loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence rate of the expected classification error in the final phase of learning for a wide class of differentiable convex loss functions under similar assumptions.